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Abstract 

We  explore  the  constrained  efficient  observational  learning  model  —  as  when 
individuals  care  about  successors,  or  are  so  induced  by  an  informationally- 
constrained  social  planner.  We  find  that  when  the  herding  externality  is  cor- 
rectly internalized  in  this  fashion,  incorrect  herds  still  obtain. 

To  describe  behaviour  in  this  model,  we  exhibit  a  set  of  indices  that  capture 
the  privately  estimated  social  value  of  every  action.  The  optimal  decision 
rule  is  simply:  Choose  the  action  with  the  highest  index.  While  they  have 
the  flavour  of  Gittins  indices,  they  also  incorporate  the  potential  to  signal  to 
successors.  We  then  apply  these  indices  to  establish  a  key  comparative  static, 
that  the  set  of  stationary  'cascade'  beliefs  strictly  shrinks  as  the  planner  grows 
more  patient.  We  also  show  how  these  indices  yield  a  set  of  history-dependent 
transfer  payments  that  decentralize  the  constrained  social  optimum. 

The  lead  inspiration  for  the  paper  is  our  proof  that  informational  herding 
is  but  a  special  case  of  myopic  single  person  experimentation.  In  other  words, 
the  incorrect  herding  outcome  is  not  a  new  phenomenon,  but  rather  just  the 
familiar  failure  of  complete  learning  in  an  optimal  experimentation  problem. 
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Herding  as  Experimentation  Deja  Vu."  Still  earlier,  the  proposed  mapping  appeared  in  July  1995  as  a 
later  section  in  our  companion  paper,  "Pathological  Outcomes  of  Observational  Learning."  We  thank 
Abhijit  Banerjee,  Meg  Meyer,  Christopher  Wallace,  and  seminar  participants  at  the  MIT  theory  lunch,  the 
Stockholm  School  of  Economics,  the  Stony  Brook  Summer  Festival  on  Game  Theory  (1996),  Copenhagen 
University,  and  the  1997  European  Winter  Meeting  of  the  Econometric  Society  (Lisbon)  for  comments  on 
various  versions.  Smith  gratefully  acknowledges  financial  support  for  this  work  from  NSF  grants  SBR- 
9422988  and  SBR-9711885,  and  S0rensen  equally  thanks  the  Danish  Social  Sciences  Research  Council. 
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1.   INTRODUCTION 

The  last  few  years  has  seen  a  flood  of  research  on  informational  herding.  We  ourselves 
have  actively  participated  in  this  research  herd  (Smith  and  Sorensen  (1997a),  or  SS)  that 
was  sparked  independently  by  Banerjee  (1992)  and  BHW:  Bikhchandani,  Hirshleifer,  and 
Welch  (1992).  The  context  is  seductively  simple:  An  infinite  sequence  of  individuals  must 
decide  on  an  action  choice  from  a  finite  menu.  Everyone  has  identical  preferences  and 
menus,  and  each  may  condition  his  decision  both  on  his  endowed  private  signal  about  the 
state  of  the  world,  and  on  all  predecessors'  decisions  (but  cannot  see  their  private  signals). 

In  this  context,  SS  showed  that  beliefs  converge  upon  a  cascade  —  i.e.  where  only  one 
action  is  taken  with  probability  one.1  BHW  and  Banerjee  showed  that  a  'herd'  eventually 
arises  --  namely,  after  some  point,  all  decision-makers  (Z\Ms)  make  the  identical  choice, 
possibly  unwise.  Clarifying  their  next  result,  SS  also  showed  that  this  herd  is  ex  post 
incorrect  with  positive  probability  iff  the  T>A4s'  private  signals  are  uniformly  bounded  in 
strength.  This  simple  pathological  outcome  has  understandably  attracted  much  fanfare. 

The  main  thrust  of  this  paper  is  an  analysis  of  the  herding  externality  with  forward- 
looking  behavior.  Contrary  to  the  popular  impression  of  incorrect  herds  as  a  market 
failure,  we  show  that  herding  is  constrained-efficient:  Even  when  VAis  internalize  the 
herding  externality  by  placing  very  low  weight  on  their  private  gain,  incorrect  herds  obtain 
whenever  private  signals  are  boundedly  powerful;  however,  they  occur  with  a  vanishing 
chance  as  the  discount  factor  tends  to  one.  The  VAis'  lack  of  concern  for  successors  affects 
just  the  extent  of  incomplete  learning,  and  not  its  existence. 

Social  information  is  poorly  aggregated  by  action  observation,2  as  individuals  may 
choose  actions  that  reveal  almost  none  of  their  information.  But  suppose  that  early  VM.S 
either  wished  to  help  latecomers,  or  were  so  induced,  by  taking  more  revealing  actions  that 
better  signalled  their  private  information.  Exactly  what  instructions  should  a  planner  give 
to  the  VAA's  to  maximize  social  welfare?  We  provide  a  compact  description  of  optimal 
behaviour  in  this  constrained  efficient  herding  model.  We  produce  a  set  of  indices  that 
capture  the  privately  estimated  social  value  of  every  action.  The  optimal  decision  rule  is 
simply  to  choose  the  action  with  the  highest  index.  While  they  have  the  flavour  of  Gittins' 
(1979)  multi-armed  bandit  indices,  they  also  must  incorporate  the  potential  to  signal  to 
successors.  So  unlike  Gittins'  perfect  information  context,  an  action's  index  is  not  simply 
its  present  social  value.  Rather,  individuals'  signals  are  hidden  from  view,  and  therefore 
social  rewards  must  be  translated  into  private  incentives  using  the  marginal  social  value. 


*As  convergence  may  take  infinite  time  (see  SS),  we  have  a  limit  cascade,  as  opposed  to  BHW's  cascade. 
2Dow  (1991)  and  Meyer  (1991)  have  also  studied  the  nature  of  such  a  coarse  information  process  for 
different  contexts:  search  theory  and  organizational  design. 


We  apply  these  indices  to  establish  a  key  comparative  static,  that  the  cascade  belief  set 
strictly  shrinks  when  VMs  grow  more  patient. 

Our  second  application  of  the  indices  is  to  the  equivalent  problem  of  the  informationally- 
constrained  social  planner,  whose  goal  is  to  maximize  the  present  discounted  value  of  all 
VMs'  welfare.  For  the  altruistic  herding  fiction  is  only  worthy  of  study  if  it  can  be  de- 
centralized. A  social  planner  must  encourage  early  VMs  to  sacrifice  for  the  informational 
benefit  of  posterity.  Such  an  outcome  can  be  decentralized  as  a  constrained  social  optimum 
by  means  of  a  simple  set  of  history-dependent  balanced-budget  transfers.  These  transfers 
are  given  by  our  indices,  and  have  a  rather  intuitive  economic  meaning. 

This  paper  was  originally  sparked  by  a  simple  question  about  informational  herding: 
Haven't  we  seen  this  before?  We  were  piqued  by  its  similarity  to  the  familiar  failure  of 
complete  learning  in  an  optimal  experimentation  problem.  Rothschild's  (1974)  analysis  of 
the  two-armed  bandit  is  a  classic  example:  An  impatient  monopolist  optimally  experiments 
with  two  possible  prices  each  period,  with  fixed  but  uncertain  purchase  chances  for  each 
price.  Rothschild  showed  that  the  monopolist  (i)  eventually  settles  down  on  one  of  the 
two  prices,  and  (u)  selects  the  less  profitable  price  with  positive  probability.  To  us,  this 
had  the  clear  ring  of:  (i)  an  action  herd  occurs,  and  (ii)  with  positive  chance  is  misguided. 

This  paper  also  formally  justifies  this  intuitive  link.  We  prove  that  informational 
herding  is  not  a  new  phenomenon,  but  a  camouflaged  context  for  an  old  one:  myopic  single 
person  experimentation,  with  possible  incomplete  learning.  Our  proof  respects  the  herding 
paradigm  quintessence  that  predecessors'  signals  be  hidden  from  view.  In  a  nutshell,  we 
replace  all  VMs  by  agent  machines  that  automatically  map  any  realized  private  signals 
into  action  choices;  the  true  experimenter  then  must  furnish  these  automata  with  optimal 
history-contingent  'decision  rules'.  We  therefore  reinterpret  actions  in  the  herding  model  as 
the  experimenter's  stochastic  signals,  and  the  VMs'  decision  rules  as  his  allowed  actions. 
We  perform  this  formal  embedding  for  a  very  general  observational  learning  context.3 

The  organization  of  this  paper  is  as  follows.  As  we  must  first  solve  the  experimentation 
problem  anyway,  it  makes  more  sense  to  proceed  backwards,  ending  with  the  economics. 
So  section  2  describes  a  general  infinite  player  observational  learning  model,  and  then 
re-interprets  it  as  an  optimal  single-person  experimentation  model.  Focusing  on  the  finite- 
action  informational  herding  model,  section  3  characterizes  the  experimenter's  limiting 
beliefs.  The  altruistic  herding  model  is  introduced  in  section  4,  and  optimal  strategies  are 
described  using  index  rules;  these  are  then  applied  for  our  key  comparative  static,  as  well 
as  a  description  in  section  5  of  the  optimal  transfers  for  the  equivalent  planner's  problem. 
A  conclusion  affords  a  broader  perspective  on  our  findings.  Some  proofs  are  appendicized. 


3 Such  an  embedding  is  well-known  and  obvious  for  rational  expectations  pricing  models  (eg.  Scheinkman 
and  Schechtman  (1983)),  since  the  price  is  publicly  observed,  and  an  inverse  mapping  is  not  required. 


2.   TWO  EQUIVALENT  LEARNING  MODELS 

In  this  section,  we  first  set  up  a  rather  general  observational  learning  model,  that  sub- 
sumes SS,  and  thus  BHW  and  Banerjee  (1992).  All  models  in  this  class  are  then  formally 
embedded  in  the  experimentation  framework.  Afterwards,  we  specialize  our  findings. 

2.1   Observational  Learning 

Information.  An  infinite  sequence  of  decision-makers  (DMs)  n  =  1,  2, . . .  takes 
actions  in  that  exogenous  order.  There  is  uncertainty  about  the  payoffs  from  these  actions. 
The  elements  of  the  parameter  space  (f2,  T)  are  referred  to  as  states  of  the  world.  There 
is  a  given  common  prior  belief,  the  probability  measure  v  over  Q. 

The  nth  VA4  observes  a  partially  informative  private  random  signal  an  G  E  about  the 
state  of  the  world.  As  shown  in  SS  (Lemma  1),  we  may  assume  WLOG  that  the  private 
signal  received  by  a  T>M.  is  actually  his  private  belief,  i.e.  we  let  a  be  the  measure  over 
Q  which  results  from  Bayesian  updating  given  the  prior  v  and  observation  of  the  private 
signal.  Signals  thus  belong  to  E,  the  space  of  probability  measures  over  (f],^7),  and  Q  is 
the  associated  sigma-algebra.  Conditional  on  the  state,  the  signals  are  assumed  to  be  i.i.d. 
across  2? .Ms,  drawn  according  to  the  probability  measure  \iw  in  state  ui  G  £1  To  avoid 
trivialities,  assume  that  not  all  /j,w  are  (a.s.)  identical,  so  that  some  signals  are  informative. 
Each  distribution  may  contain  atoms,  but  to  ensure  that  no  signal  will  perfectly  reveal  the 
state  of  the  world,  we  insist  that  all  p?  be  mutually  absolutely  continuous  (a.c),  for  u>  G  Q. 

Bayesian  Decision-Making.        Everyone  chooses  from  a  fixed  action  set  A,  equipped 
with  the  sigma-algebra  A.  Action  a  earns  a  nonstochastic  payoff  u(a,  u>)  in  state  uj  G  Q, 
the  same  for  all  DMs,  where  u  :  A  x  Q  h->  R  is  measurable.  It  is  common  knowledge  that 
everyone  is  rational,  i.e.  seeks  to  maximize  his  expected  payoff.  Before  deciding  upon  an 
action,  everyone  first  observes  his  private  signal/belief  and  the  entire  action  history  h. 

Each  DM's  Bayes-optimal  decision  uses  the  observed  action  history  and  his  own  pri- 
vate belief.  As  in  SS,  we  simply  assume  that  a  DM  can  compute  the  behaviour  of  all 
predecessors,  and  can  use  the  common  prior  to  calculate  the  ex  ante  (time-0)  probability 
distribution  over  action  profiles  h  in  either  state.  Knowing  these  probabilities,  Bayes'  rule 
implies  a  unique  public  belief  it  =  ir(h)  G  E  for  any  history  h.  A  final  application  of  Bayes' 
rule  given  the  private  belief  a  yields  the  posterior  belief  p  G  E. 

Given  the  posterior  belief  p,  the  nth  VA4  picks  the  action  a  G  A  which  maximizes  his 
expected  payoff  ua(p)  —  Jn  u(a,  u>)dp(to).  We  assume  that  such  an  optimal  action  a  =  a(p) 
exists.4    The  solution  defines  an  optimal  decision  rule  x  from  E  to  A(A),  the  space  of 


4  Absent  a  unique  solution,  we  must  take  a  measurable  selection  from  the  solution  correspondence. 


probability  measures  over  (A,  A).  Let  X  be  the  space  of  such  maps  x  :  E  i->  A(A).  A  rule 
x  produces  an  implied  distribution  over  actions  simultaneously  for  all  private  beliefs  a. 

The  Stochastic  Process  of  Beliefs.  Since  the  optimal  x  depends  on  it,  and 
the  probability  measure  of  signals  a  depends  on  the  state  u,  the  implied  distribution  over 
actions  a  depends  on  both  u>  and  the  optimal  decision  rule  x.  The  density  is  ip(a\ui,x)  = 
f  x(a)(a)/j,u'(da),  and  unconditional  on  the  state,  it  is  xp(a\ir,x)  =  fnip(a\uj,x)ir(du)).  This 
in  turn  yields  a  distribution  over  next  period  public  beliefs  7rn+1.  Thus,  (tt„)  follows  a 
discrete-time  Markov  process  with  state-dependent  transition  chances. 

2.2  Informational  Herding  as  Experimentation  Deja  Vu 

And  out  of  old  bookes,  in  good  faithe, 
Cometh  al  this  new  science  that  men  lere. 

—  Geoffrey  Chaucer5 

Our  immediate  goal  is  to  recast  the  observational  problem  outcome  as  a  single  person 
optimization.  A  first  stab  brings  us  to  the  forgetful  experimenter,  who  each  period  receives 
a  new  informative  signal,  takes  an  optimal  action,  and  then  promptly  forgets  his  signal; 
the  next  period,  he  can  reflect  only  on  his  action  choice.  But  this  is  not  a  model  of  Bayes- 
optimal  experimentation,  since  it  assumes  and  in  fact  requires  irrational  behaviour.  How 
then  can  an  experimenter  not  observe  the  private  signals,  and  yet  take  informative  actions? 

To  give  proper  context  to  our  resolution,  it  helps  to  consider  McLennan  (1984).  This 
nice  sequel  to  Rothschild's  work  permitted  the  monopolist  to  charge  one  of  a  continuum  of 
prices,  but  assumed  only  two  possible  linear  purchase  chance  'demand  curves'.  McLennan 
found  that  the  resulting  uninformative  price  when  the  demand  curves  crossed  may  well 
eventually  be  chosen  by  an  optimizing  monopolist. 

Rothschild's  and  McLennan's  models  give  examples  of  potentially  confounding  actions, 
as  introduced  in  EK:  Easley  and  Kiefer  (1988).  In  brief,  such  actions  are  optimal  for 
unfocused  beliefs  for  which  they  are  invariants  (i.e.  taking  the  action  leaves  the  beliefs 
unchanged).  Of  particular  significance  is  the  proof  in  EK  (on  page  1059)  that  with  finite 
state  and  action  spaces,  potentially  confounding  actions  generically  do  not  exist,  and 
thus  complete  learning  must  arise.6  Rothschild  and  McLennan  might  be  seen  as  separate 
anticipations  of  EK's  general  insight.  Rothschild  escapes  it  by  means  of  a  continuous 
state  space,  whereas  McLennan  resorts  to  a  continuous  action  space.  Yet  there  appears  no 
escape  for  the  herding  paradigm,  where  both  flavours  of  incomplete  learning,  limit  cascades 


5See  The  Assembly  of  Fowles.  Line  22. 

6Eg:  payoff's  in  a  one-armed  bandit,  with  a  potentially  confounding  safe  arm,  are  not  generic  in  M2. 


Observational  Learning  Model 

Impatient  Experimenter  Model 

State:  u>  E  f2 

State:  well 

Public  Belief  after  nth  VM:  irn 

Belief  after  n  observations:  7rn 

Optimal  decision  rule:  x  G  X 

Optimal  action:  x  E  X 

Private  signal  of  nth  VM:  an 

Randomness  in  the  nth  experiment:  an 

Action  taken  by  each  VM:  a  G  A 

Observable  signals:  a  €  A 

Density  over  actions:  il)(a\u),x) 

Density  over  observables:  ip(a\u,x) 

Payoffs:  private  information 

Payoffs:  unobserved 

Table  1:  Embedding.  This  table  displays  how  our  single-type  observational  learning 
model  fits  into  the  impatient  single  person  experimentation  model. 

and  confounded  learning  (see  SS),  generically  arise  with  two  actions  and  two  states.  This 
puzzle  suggests  the  inverse  mapping  that  we  now  consider. 

In  recasting  our  general  observational  learning  model  as  a  single  person  experimentation 
problem,  we  must  focus  on  the  myopic  experimenter  with  discount  factor  0  (ruling  out 
active  experimentation).  Steering  away  from  a  forgetful  experimenter,  we  shall  regard 
the  observational  learning  story  from  a  new  perspective.  Consider  the  nth  VM,  who  uses 
both  the  public  belief  7rn  and  his  private  signal  on  in  forming  and  acting  upon  his  posterior 
beliefs  pn.  We  may  separate  these  two  steps  by  the  conditional  independence  of  7r„  and  an. 
Regard  Mr.  n  as:  (i)  observing  irn,  but  not  his  private  signal;  (ii)  optimally  determining 
the  rule  x  G  X,  and  submitting  it  to  an  agent  'choice'  machine;  and  (Hi)  letting  that 
machine  observe  his  private  signal  and  take  his  action  a  G  A  for  him.  The  payoff  u(a,u) 
is  unobserved,  lest  that  provide  an  additional  signal  of  the  state  of  the  world.  If  private 
beliefs  a  have  distribution  pw  in  state  to,  then  the  experimenter  chooses  the  same  optimal 
decision  rule  x  described  in  section  2,  resulting  in  action  a  £  A  with  chance  ip(a\u),x). 

Thus,  the  observational  learning  model  corresponds  to  a  single-person  experimentation 
model  where:  The  state  space  is  fi.  In  period  n,  the  experimenter  EX  chooses  an  action 
(the  rule)  x  €  X.  Given  this  choice,  a  random  observable  statistic  a  G  A  is  realized  with 
chance  ip(a\to,x)  in  state  to.  Finally,  EX  updates  beliefs  using  this  information  alone.7 
Table  1  summarizes  the  embedding. 

Notice  how  this  addresses  both  lead  puzzles.  First,  the  experimenter  never  knows  the 
private  beliefs  a,  and  thus  does  not  forget  them.  Second,  incomplete  learning  (bad  herds) 
are  entirely  consistent  with  EK's  generic  finding  of  complete  learning  for  models  with  finite 


7This  model  doesn't  strictly  fit  into  the  EK  mold,  where  stage  payoffs  depend  only  on  the  action  and 
the  observed  signal,  but  (unlike  here)  not  on  the  parameter  wefi.  This  is  the  structure  of  Aghion,  Bolton, 
Harris,  and  Jullien  (1991)  (ABHJ),  who  admit  unobserved  payoffs.  Alternatively,  we  could  posit  that  £X 
has  fair  insurance,  and  only  sees/earns  his  expected  payoff  each  period  and  not  his  random  realized  payoff. 


action  and  state  spaces.  Simply  put,  actions  do  not  map  to  actions  but  to  signals  when  one 
rewrites  the  observational  learning  model  as  an  experimentation  model.  The  true  action 
space  for  EX  is  the  infinite  space  X  of  decision  rules. 

SS  considered  two  major  modifications  of  the  informational  herding  paradigm.  One 
was  to  add  i.i.d.  noise  ('crazy'  preference  types)  to  the  DAi's  decision  problem.  Noise  is 
easily  incorporated  here  by  adding  an  exogenous  chance  of  a  noisy  signal  (random  action). 
SS  also  allowed  for  T  different  types  of  preferences,  with  VAds  randomly  drawn  from  one 
or  the  other  type  population.  Multiple  types  can  be  addressed  here  by  simply  imagining 
that  EX  chooses  a  T-vector  of  optimal  decision  rules  from  A"T  with  (only)  the  choice 
machine  observing  the  task  and  private  belief,  and  choosing  the  action  a  as  before. 

3.   THE  PATIENT  EXPERIMENTER 

3.1   The  Reformulated  Model 

From  now  on,  we  restrict  ourselves  to  the  more  focused  herding  analytic  framework  - 
a  two  state,  finite  action  setting,  as  in  SS.  More  states  complicates  but  does  not  enrich. 

We  assume  a  state  space  Q  =  {H,L},  with  both  states  equilikely  ex  ante,  i.e.  having 
prior  v(L)  =  v(H)  =  1/2.  Private  belief  a  is  the  chance  of  state  H,  so  that  E  =  [0, 1]. 
Let  supp(^i)  be  the  common  support  of  each  probability  measure  pw  over  private  beliefs 
(i.e.  noise  for  EX's  problem).  If  supp(/u)  C  (0, 1),  then  private  beliefs  are  bounded;  they 
are  unbounded  if  co(supp(/^))  =  [0, 1]  -  i.e.  if  arbitrarily  strong  private  beliefs  exist.  The 
half-bounded,  half-unbounded  case  is  a  direct  sum  of  these  separate  analyses. 

We  make  the  standard  herding  assumption  of  a  finite  action  space  A  =  {ai, . . . ,  a^}. 
We  assume  that  no  action  is  dominated,  yielding  the  standard  interval  structure  that  action 
am  is  optimal  exactly  when  the  posterior  p  is  in  some  sub-interval  of  [0, 1].  WLOG,  we  can 
then  order  the  actions  such  that  am  is  myopically  optimal  for  posteriors  p  £  [fm-i,fm], 
where  0  =  f0  <  f\  <  . . .  <  fM  =  1. 

A  strategy  sn  for  period  n  is  a  map  from  E  to  X.  It  prescribes  the  rule  xn  €  X  which 
must  be  used,  given  belief  7rn.  The  planner  chooses  a  strategy  profile  s  =  (si,S2, ...), 
which  in  turn  determines  the  stochastic  evolution  of  the  model  --  i.e.  a  distribution  over 
the  sequences  of  realized  actions,  payoffs,  and  beliefs. 

The  Value  Function.  Our  analysis  here  follows  ABHJ  and  sections  9.1-2  of 
Stokey  and  Lucas  (1989).  The  value  function  v(-,S)  :  E  i— >•  R  for  the  planning  problem 
with  discount  factor  S  is  v(ir,  8)  =  sup5  E[(l  —  5)  Yl^Li  ^n"lun\^},  where  the  expectation  is 
over  the  payoff  sequences  implied  by  s.  Recall  that  um(ir)  =  nu(am,  H )  +  (1  —  7r)u(am,  L) 
denotes  the  expected  payoff  from  am  at  belief  n.  Since  um  is  affine,  the  Bellman  equation 


v(ir,S)  =  sup  <  J^  ij>(am\n,x)  [(1  -  S)um(q(n,x,am))  +  8v(q(ir,x,am),6)]  >        (1) 

where  q(ir,  x,  a)  is  the  Bayes-updated  belief  from  7r  when  a  is  observed  and  rule  x  is  applied. 
A  (Markov)  policy  for  ^A"  is  a  map  £  :  [0, 1]  — >  X  (so  the  rule  given  belief  it  is  £(7r)). 
The  optimum  in  a  Markovian  decision  problem  with  discount  factor  6  exists  (eg.  ABHJ, 
Theorem  4.1),  and  is  achieved  by  some  such  policy,  generically  written  £  .  In  summary: 

Lemma  1     For  any  discount  factor  S  <  1,  EX  has  an  optimal  policy  £5  :  [0, 1]  — >  X. 

Interval  Structure.  SS  shows  that  the  myopic  experimenter  maps  higher  beliefs 
into  higher  actions:  There  are  thresholds  0  =  90  <  9\  <  . . .  <  9M  =  1  depending  on 
7r  alone,  such  that  action  am  is  strictly  optimal  when  a  G  (#m_i,(9m),  and  indifference 
between  between  am  and  am+i  prevails  at  the  knife-edge  a  =  9m.  This  is  also  true  with 
patience,  as  Lemma  2  proves.  Intuitively,  not  only  does  the  interval  structure  respect  the 
action  order  to  yield  high  immediate  payoffs,  but  it  also  ensures  the  greatest  information 
value,  by  producing  the  riskiest  posterior  belief  distribution.8 

Lemma  2  For  the  belief  it  of  EX.  Any  optimal  rule  x  G  X  is  almost  surely  described  by 
thresholds  0  =  9q  <  9\  <  •  •  •  <  9m  =  1  such  that  action  am  is  taken  when  a  £  (#m_i,#m), 
and  EX  randomizes  between  am  and  am+\  when  a  =  9m. 

Proof:  We  prove  that  any  rule  x  without  an  interval  structure  can  be  strictly  improved 
upon.  For  mi  <  m2,  let  E,  (i  =  1,  2)  be  those  beliefs  in  E  mapped  with  positive  probability 
into  amr  Assume  to  the  contrary  that  the  sets  are  not  almost  surely  ordered  as  Ei  <  E2. 

If  posteriors  are  perversely-ordered  as  q(7r,  x,  ami)  >  q{ir,  x,  am2),  then  given  our  action 
ordering,  payoffs  are  strictly  improved  with  no  loss  of  information  by  remapping  beliefs 
leading  to  ami  into  am2,  and  vice  versa.  That  is,  the  myopic  payoff  is  strictly  improved, 
since  umi  —  um2  is  a  decreasing  function,  while  the  continuation  value  is  unchanged. 

Next,  assume  that  g(7r,x,ami)  <  q(ir,x,am2).  For  any  9  G  (0,1),  define  Ex(#)  = 
(Ei  U  E2)  n  [0,  9}  and  E2(<9)  =  (Ei  U  E2)  n  [8, 1].  Consider  then  the  modified  rule  x  which 
equals  x,  except  that  ami  is  taken  for  beliefs  in  Ei(#),  and  where  9  satisfies  ip(ami\7r,x)  = 
•0(amj7r,  x).  (It  may  be  necessary  for  x  to  randomize  over  the  two  actions  at  belief  9 
to  accomplish  that.)  Since  beliefs  more  in  favour  of  state  L  are  mapped  into  ami  under 
i,  we  find  q(7T,x,ami)  <  g(7r,x,ami),  and  similarly  q(n,  x,  am.2)  >  q(7r,x,am2),  with  at 
least  one  inequality  strict.  Thus,  x  yields  a  mean  preserving  spread  of  the  updated  belief - 
versus  x,  and  since  the  continuation  value  function  is  weakly  convex,  its  expectation  is 
weakly  improved.  But  as  we  have  just  argued,  the  myopic  payoff  is  strictly  improved.    □ 


8This  problem  is  not  without  history.    Sobel  (1953)  investigated  an  interval  structure  in  a  simple 
statistical  decision  problem,  and  more  recently,  Dow  (1991)  in  a  two  period  search  model. 
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Our  earlier  assumption  that  no  action  is  dominated  is  no  longer  so  innocent,  and 
may  in  fact  have  some  bite  for  very  patient  EX.  For  the  information  value  alone,  taking 
dominated  actions  in  principle  provides  additional  signalling  power.  Essentially,  it  leaves 
each  VM.  with  a  larger  finite  alphabet  with  which  to  signal  his  continuous  private  belief. 
But  including  such  actions  would  invalidate  the  above  proof,  and  thus  perhaps  the  interval 
structure.  We  leave  this  stone  unturned,  acknowledging  the  possible  loss  of  generality. 

3.2  Long  Run  Behavior 

As  is  generally  the  case  with  Bayesian  learning,  convergence  is  deduced  by  application 
of  the  martingale  convergence  theorem  to  the  belief  process.  Not  only  must  beliefs  settle 
down,  but  also  EX  is  never  dead  wrong  about  the  state.  A  proof  is  found  in  SS. 

Lemma  3  The  belief  process  {irn)  is  a  martingale  unconditional  on  the  state,  converging 
a.s.  to  some  limiting  random  variable  vr^.   The  limit  ir^  is  concentrated  on  (0, 1]  in  state  H . 

The  next  result  is  an  expression  of  EK's  Theorem  5  that  the  limit  belief  ir^  precludes 
further  learning.  In  the  informational  herding  model,  this  is  only  possible  during  a  cascade, 
when  one  action  chosen  is  chosen  almost  surely,  and  thus  is  uninformative.  The  next 
characterization  of  the  stationary  points  of  the  stochastic  process  of  beliefs  (7rn)  directly 
generalizes  the  analysis  for  5  =  0  in  SS.  See  figure  1  for  an  illustration  of  how  the  cascade 
sets  are  reflected  in  the  shape  of  the  optimal  value  function. 

Proposition  1  (Cascade  Sets)  There  exist  M  (possibly  empty)  subintervals  of  [0,1], 
Ji(5)  <  •  •  •  <  Jm{o~),  such  that  EX  optimally  chooses  x  a.s.  inducing  am  £  A  iffn  6  Jm{6). 

(a)  For  all  6  £  [0, 1),  the  limit  belief  ix^  is  concentrated  on  the  sets  Ji(6)  U  •  •  •  U  Jm(o~). 

(b)  With  unbounded  private  beliefs,  the  extreme  cascade  sets  are  nonempty  with  J\(S)  =  {0} 
and  Jm(°~)  =  {1}>  and  all  other  Jm(o~)  are  empty. 

(c)  If  the  private  beliefs  are  bounded,  then  Ji(5)  =  [0,7r(5)]  and  Jm(o~)  =  [7f(<5),l],  where 
0  <  n(5)  <  it(5)  <  1.  For  large  enough  5,  all  cascade  sets  disappear  except  for  J\  and  Jm, 
while  lim^i  J\(5)  =  {0}  and  lim^i  Jm(^)  =  {1}- 

Proof:  All  but  the  initial  limit  belief  result  are  established  in  the  appendix.  To  see  why 
that  one  is  true  -  -  that  a  limit  cascade  must  occur,  as  SS  call  it  —  observe  that  for 
any  belief  n  not  in  any  cascade  set,  at  least  two  signals  in  A  are  realized  with  positive 
probability.  By  the  interval  structure,  the  highest  such  signal  is  more  likely  in  state  H,  and 
the  lowest  more  likely  in  state  L.  So  the  next  period's  belief  differs  from  tt  with  positive 
probability.  Intuitively,  or  by  the  characterization  result  for  Markov-martingale  processes 
in  appendix  B  of  SS,  it  cannot  lie  in  the  support  of  tt^.  □ 
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Figure  1:  Typical  value  function.  Stylized  graph  of  v(tt,S),  5  >  0,  with  three  actions. 
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The  proof  of  this  result  also  shows  that  the  larger  is  S,  the  weakly  smaller  are  all  cascade 
sets:  Indeed,  this  drops  out  rather  easily  from  the  monotonicity  of  the  value  function 
in  5.  We  defer  asserting  this  result  for  now,  and  in  fact  Proposition  5  leverages  weak 
monotonicity,  and  the  index  rules  that  we  introduce  later  on,  to  deduce  strict  monotonicity. 

Proposition  2  (Convergence  of  Beliefs)     Consider  a  solution  of  EX's  problem. 

(a)  For  unbounded  private  beliefs,  tt,^  is  concentrated  on  the  truth  for  any  8  G  [0, 1). 

(b)  With  bounded  private  beliefs,  learning  is  incomplete  for  any  S  6  [0,1):  Unless  tt0  G 
Jm{S),  there  is  positive  probability  in  state  H  that  tTqo  is  not  in  Jm(S). 

(c)  The  chance  of  incomplete  learning  with  bounded  private  beliefs  vanishes  as  S  "f  1 . 

Proof:  Part  (a)  follows  from  Lemma  3  and  Proposition  l-a,b,  and  part  (b)  just  as  in 
Theorem  1  of  SS.  We  now  extend  that  proof  to  establish  the  limiting  result  for  S  t  1  in 
part  (c).  First,  Proposition  1  assures  us  that  for  S  close  enough  to  1,  n^  places  all  weight 
in  J\(S)  and  Jm{S).  The  the  likelihood  ratio  ln  —  (1  —  7Tn)/nn  is  a  martingale  conditional 
on  state  H .  Because  likelihood  ratio  (1  —  o)jo  bounded  above  by  some  I  <  oo  for  all 
private  beliefs  a,  the  sequence  (£n)  is  bounded  above  by  1(1  —  k(S))/k(S),  and  the  mean 
of  ^oo  must  equal  its  prior  mean  (1  —  7r0)/7ro.  Since  lim^i k{S)  =  0,  the  weight  that  71-00 
places  on  J\(S)  in  state  H  must  vanish  as  5  — >  1.  □ 

Observe  how  incomplete  learning  besets  even  an  extremely  patient  EX.  So  this  prob- 
lem does  not  fall  under  the  rubric  of  EK's  Theorem  9,  where  it  is  shown  that  if  the  optimal 
value  function  v  is  strictly  convex  in  beliefs  7T,  learning  is  complete  for  5  near  1.  For  here, 
EX  optimally  behaves  myopically  for  very  extreme  beliefs:  v(n)  =  Ui(ir)  for  ir  near  0,  and 
v(ir)  =  um{t^)  for  it  near  1,  both  affine  functions.  This  points  to  the  source  of  the  incom- 
plete learning:  lumpy  signals  (actions)  rather  than  impatience.  It  is  simply  individuals' 
inability  to  properly  signal  their  private  information  that  frustrates  the  learning  process. 


4.   ALTRUISTIC  HERDING  AND  INDEX  RULES 

4.1  The  Welfare  Theorem 

We  now  shift  focus  from  the  problem  facing  EX  to  the  informational  herding  context 
with  a  sequence  of  VM's.  For  organizational  simplicity,  we  simply  first  suppose  that  every 
VM  is  altruistic,  but  subject  to  the  usual  informational  herding  constraints  (observable 
actions,  unobservable  signals).  Define  an  altruistic  herding  equilibrium  (AHE)  as  a  Bayes- 
Nash  equilibrium  of  the  game  where  every  VM  n  =  1,  2, . . .  seeks  to  maximize  the  present 
discounted  welfare  of  all  posterity,  themselves  included:  E[(l  —  5)  YLkLn  ^""^fcl71"]-  Define 
the  expected  reward  of  the  payoff  function  /  as  £'[/|/t]  =  Xlwen  ^i^fi10)- 

The  next  result  is  quite  natural,  but  is  proved  for  clarity. 

Proposition  3  For  any  discount  factor  S  <  1,  any  optimal  policy  £5  for  EX  is  an  AHE. 
Consequently,  an  AHE  exists. 

Proof:  Fix  a  given  VM  and  a  public  belief  tt.  Assume  that  £s  is  the  behaviour  strategy 
of  all  successors  in  an  AHE,  but  that  the  VM  has  some  rule  x  that  is  a  better  reply  than 
is  £*(7r).  Then  EX  can  improve  his  value  at  n  by  fully  mimicking  this  deviation,  i.e.  by 
(i)  taking  x  in  the  first  period  and  thereafter  (ii)  continuing  with  £s  as  if  the  first  period 
history  had  been  generated  by  ^(/t).  This  contradicts  the  optimality  of  EX's  policy.      □ 

4.2  Choosing  the  Best  Action 

Recall  the  classical  problem  of  the  multi-armed  bandit:9  A  given  patient  experimenter 
each  period  must  choose  one  of  n  actions,  each  having  an  uncertain  independent  reward 
distribution.  The  experimenter  must  therefore  carefully  trade-off  the  informational  and 
myopic  payoffs  ssociated  with  each  action.  Gittins  (1979)  showed  that  optimal  behavior 
in  this  model  can  be  described  by  simple  index  rules:  Attach  to  each  action  the  value  of 
the  problem  with  just  that  action  and  the  largest  possible  lump  sum  retirement  reward 
yielding  indifference.  Finally,  each  period,  just  choose  the  action  with  the  highest  index. 

We  now  argue  that  the  optimal  policy  employed  by  the  decision  makers  in  an  AHE  has 
a  similarly  appealing  form:  For  a  given  public  belief  7r  and  private  belief  p,  VM  chooses 
the  action  am  with  the  largest  index  u^(7r,p).  This  measure  will  incorporate  the  social 
payoff,  as  did  the  Gittins  index,  but  as  privately  estimated  by  the  VM. 

Before  stating  the  next  major  result,  recall  that  dg(y)  denotes  the  subdifferential  of 
the  convex  function  g  at  y  --  namely,  the  set  of  all  A  that  obey  g(x)  >  g(y)  +  A  •  (x  —  y) 
for  all  x.  Moreover,  dg{y)  is  (strictly)  increasing  in  y  (strict)  convexity. 


3 An  excellent,  if  brief,  treatment  is  found  in  §6.5  of  Bertsekas  (1987). 
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Proposition  4  (Index  Rules)  Fix  any  AHE  strategy  f*.  For  m  =  1,2 ...  ,M,  there 
exist  Xm  G  dv(q(iv,  £s(n),  am),  5)  such  that  the  average  expected  present  discounted  value  of 
action  am  to  the  decision  maker  faced  with  public  beliefs  it  and  private  belief  p  is 


«i 


{n,p)  =  (l-S)um{p{TT,p))+6{v{q{iT,t6{n),am):5)  +  \m[p(iT,p)-q{TT^5{ir),am)}}    (2) 


where  p(ir,p)  —  irp/[jTp+(l  — 7r)(l— p)}  is  the  posterior  of  it  given  p.  So  the  optimal  decision 
rule  is  to  take  action  am  when  w5m{'K,p)  =  max^  w5k {tt,p). 
Proof:     Fix  a  given  decision  maker  VM  facing  public  belief  it. 

We  now  calculate  the  payoffs  from  each  of  the  M  available  actions  a\, . . . ,  a^f.  Action 
am  of  the  VM  induces  the  public  posterior  belief  qm  =  q{rc,  C,d{^),  am),  and  a  corresponding 
state-contingent  average  expected  discounted  future  payoffs  of  his  successors,  say  v^  and 
v%.  Clearly,  the  VM's  expected  value  of  any  such  vNM  payoff  stream  is  afhne  in  his 
posterior  belief  p,  i.e.  of  the  form  hm(p)  =  E[v^\n]  =  pv"  +  (1  —  p)v^.  To  evaluate  these 
payoff  streams,  it  suffices  to  employ  the  EX's  reckoning,  since  the  VM  and  EX  entertain 
the  same  future  payoff  objectives.  Because  the  affine  function  hm  presumes  the  behaviour 
which  is  optimal  starting  at  belief  qm,  we  have  h^qm)  =  v(qm).  Next,  by  employing  the 
same  strategy  starting  at  an  arbitrary  public  belief  r  as  at  qm,  EX  can  achieve  the  value 
hm(r);  therefore,  hm(r)  <  v(r).  Thus,  the  slope  of  this  affine  function  necessarily  lies  in 
the  subdifferential  dv(qm).  The  present  value  expression  (2)  follows.  □ 

That  EX  can  always  ensure  himself  a  payoff  function  tangent  to  the  value  function 
simply  not  adjusting  his  policy  essentially  was  critical  to  this  proof.  This  simple  idea  also 
implies  convexity  of  the  value  function  (eg.  Lemma  2  of  Fusselman  and  Mirman  (1993)). 

4.3   Strict  Inclusion  of  Cascade  Sets 

We  next  use  our  index  rule  characterization  to  prove  a  key  comparative  static  of  our 
forward-looking  informational  herding  model:  As  individuals  grow  more  patient,  the  set 
of  cascade  beliefs  which  foreclose  on  learning  strictly  shrinks. 

Before  proceeding,  we  need  two  preliminary  lemmata. 

Lemma  4  (Strict  Value  Monotonicity)  The  value  function  increases  strictly  with  5 
outside  the  cascade  sets:  for  82  >  Si,  v(ir,  <52)  >  v(ir,  81)  for  all  -k  £  Ji^)  U  ■  •  •  U  Jm{&2)- 

The  detailed  proof  of  this  result  is  appendicized,  but  the  idea  is  quite  straightforward. 
Provided  EX's  strategy  in  some  future  eventuality  strictly  prefers  a  non-myopic  action, 
his  continuation  value  must  strictly  exceed  his  myopic  value.  We  then  show  that  this  holds 
for  any  continuation  public  belief  outside  both  cascade  sets  Jm(5i)  D  Jm(d2).  So  a  more 
patient  player,  who  more  highly  weights  the  continuation  value,  will  enjoy  a  higher  value. 
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We  also  exploit  the  fact  that  we  can  characaterize  differentiability  of  the  value  function 
at  the  edge  of  cascade  sets.10  For  context,  it  is  in  general  very  hard  to  identify  primitive 
assumptions  which  guarantee  the  differentiability  of  our  value  function.  Call  rules  x  and  y 
equivalent  if  they  can  be  represented  by  the  same  thresholds  (with  associated  mixing). 

Lemma  5  (Differentiability)     Let  it  ^  0, 1  be  an  endpoint  of  cascade  set  Jm(5).  Assume 
that  all  rules  optimal  at  tt  are  equivalent.   Then  v(-,5)  is  differentiable  in  the  belief  at  tt. 

Write  the  Bellman  equation  (1)  as  v  =  T$v,  and  call  T$  the  Bellman  operator.  As 
usual,  v  >  v'  implies  Tgv  >  Tsv'.  Also,  T$  is  a  contraction,  and  v(-,8)  is  its  unique  fixed 
point  in  the  space  of  bounded,  continuous,  weakly  convex  functions. 

We  can  finally  establish  the  major  comparative  static  of  this  paper,  that  if  SX  is 
indifferent  about  foreclosing  on  further  learning  at  some  belief  (i.e.  barely  in  a  cascade), 
then  he  strictly  prefers  to  learn  if  he  is  slightly  more  patient. 

Proposition  5  (Strict  Inclusion)     Assume  bounded  beliefs.  All  non-empty  cascade  sets 
shrink  strictly  when  8  rises:  Vam  e  A,  if  62  >  81  and  Jm(Si)  ^  0,  then  Jm(<!>2)  C  Jm(6i). 
Proof:     Let  r  =  inf  Jm(Si),  the  left  edge  of  the  cascade  set.    As  Step  4  of  the  proof  of 
Proposition  1  asserts  Jm(82)  f=  Jm(Si),  and  Jm(Si)  =  {k\v(tt,8i)  —  um(ir)  =  0}  is  closed  by 
continuity  oiv(7T,8i)  —  um(ir)  in  it,  we  need  only  prove  r^  Jm(52).  There  are  two  cases. 

Case  1 .  Assume  that  at  public  belief  r  and  with  discount  factor  81 ,  some  optimal 
rule  x  does  not  almost  surely  take  action  am.  Instead,  with  positive  probability,  x  takes 
some  action  ak  producing  a  posterior  belief  q(ir,  x,  a*)  not  in  any  5i-cascade  set.  [For  since 
am  is  myopically  optimal  at  r  G  Jm{8\)  Q  «/m(0),  the  optimal  rule  x  cannot  almost  surely 
induce  any  other  myopically  suboptimal  action  dj  (j  7^  m)  at  a  stationary  belief.]  So  from 
Lemma  4,  v(q{ir,x,  a^),^)  >  v(q(ir,x,ak),8i)  >  uk(q(iT,x,ak)),  and  since  we  can  always 
employ  the  rule  x  with  the  discount  factor  52,  we  must  have 

v(r,S2)  >  ^  ip(aj\r,x)[(l  -  82)uJ(q(r,x,aJ))  +  82v(q(r,x:aJ),82)] 

>  ^2  ip(aj\r,x)[(l  -8l)u3(q(r,x,aJ))  +  Siv(q(r,x,  a.j),  Si)]  =  v(r,5i) 

a.j€A 

Consequently,  we  have  v(r,52)  >  v(r,Si)  =  um(r)  and  so  r  <£  Jm{S2). 

Case  2.  Next  suppose  that  the  optimal  rule  at  public  belief  r  with  discount  factor 
8\  is  unique.  Then  the  partial  derivative  V\{r,8\)  exists  by  Lemma  5.  By  the  convexity 
of  the  value  function,  any  selection  from  the  subdifferential  dv(ir)  converges  to  Vi(r,8\) 


10We  thank  Rabah  Amir,  David  Easley,  Andrew  McLennan,  Paul  Milgrom,  Len  Mirman,  and  Yaw 
Nyarko  for  private  discussions  about  the  differentiability  of  the  value  function  in  experimentation  problems. 
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as  7r  increases  to  r.  Since  the  optimal  rule  correspondence  is  upper  hemicontinuous  by 
the  Maximum  Theorem,  and  uniquely  valued  at  r,  the  posterior  belief  q{n ,  £Sl  (n) ,  a,k)  is 
continuous  in  n  at  r  for  any  rule  optimal  selection  ^5l  and  any  action  a^. 

Let  b  —  inf supp(p)  be  the  lower  endpoint  of  the  private  belief  distribution.  As  the 
optimal  rule  at  r  almost  surely  prescribes  action  am,  we  let  q(r,  £5l{r),  am)  =  r  and 
q(r,£Sl(r),am-x)  =  p{r,b).  By  their  definition,  u>^(7r,p)  and  wr^_1(Tr,p)  are  then  jointly 
continuous  in  (ir,p)  at  (r,  b).  [In  the  expression  for  w^^,  \m-\  lies  between  the  slopes 
of  «i  and  mm,  and  is  multiplied  by  a  function  that  is  continuous  and  vanishing  at  (r,  6), 
given  q(r,£6l(r),am-i)  =  p(r,b).]  Also,  wf^(r,  b)  >  w^^r,  b)  since  r  lies  in  the  cascade 
set  Jm(5i),  while  w^(ir,b)  <  wT^_l(7r,b)  for  n  <  r,  since  r  is  the  endpoint  of  Jm(8\).  So 
Wm(r,b)  =  wr^_l(r,b)  by  continuity.  This  equality  can  be  rewritten  in  a  very  useful  form: 

um(p(r,b))  -um-i{p(r,b))  (3) 

=  <5i  [um(p(r,b))  -  um-i{p{r,b))  +  u(p(r,6),<$i)  -  u(r,*i)  -  Ajj(p(r,&)  -  r)] 

Moreover,  from  the  previous  proof  of  Proposition  4,  A^  is  the  slope  of  um,  because  the 
function  hm(p)  =  v(r,8i)  +  A^(p  —  r)  evaluates  the  prospect  of  taking  action  am  forever. 
We  shall  prove  that  wf%(r,b)  <  «;^_1(r,  b),  and  therefore  conclude  that  r  <£  Jm(52).  By 
way  of  contradiction,  assume  that  w^(r,b)  >  u^^r,  fr),  i.e.  r  =  inf  Jm(<52).  Subtracting 
wm{r,k)  >  iwr^_1(r,  6),  we  then  have  the  following  contradiction: 

0  >  [to£(r,$)  -  ^_!(r,6)]  -  [ti£(r,$)  -  ^_!(r,6)] 
=  (52  -5i)  [um(p(r,6))  -um_i(p(r,6))  -w(r,£i)] 

+Mp(r,&),  fc)  -  MpM)A)  -  <52A^(p(r,6)  -  r)  +  51A^(p(r,6)  -  r) 
>  ($2  -  Si)  [um(p(r,b))  -  um-i(p(r,b))  -v(r,6i)] 

+82v(p(r,bL),81)  -  51v(p(r,b),8i)  ~  ^A^(p(r,6)  -  r)  +  <5iA^(p(r,6)  -  r) 
=  (<*2  -  <$i)  [um{p(r,k))  -  um~i(p(r,b))  ~  «Mi)  +  v{p{r,b),8x)  -  A^(p(r,6)  -  r)] 
=  (<52  -  «Ji)  [fim(p(r,b))  -  um-i(p(r,b))}  /81  >  0 

Here's  a  detailed  justification.  Under  the  assumption  that  r  £  Jm(8i),  one  optimal  policy  £f 
induces  am  almost  surely  at  belief  r,  so  that  g(r,  £*l(r),am_i)  =  q(r,  £S2(r),  am_i)  =  p(r,b). 
The  first  equality  then  follows  substituting  (2)  for  each  index,  and  using  v(r,  52)  =  v(r,  8i) 
when  r  £  Jm(8i)  D  Jm(52).  The  second  equality  is  simple  algebra,  while  the  final  equality 
applies  (3).  The  second  inequality  exploits  A^  =  A^  (true  as  both  are  the  slope  of  um), 
and  v(p(r,b),82)  >  v(p(r,b),8i),  as  given  by  Lemma  4.  The  final  inequality  follows  since 
r  £  Jm{8])  C  Jm(0),  so  that  am  is  myopically  optimal  at  p(r,b).  D 
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5.   CONSTRAINED  EFFICIENT  HERDING 

Let  us  turn  full  circle,  and  consider  once  more  the  informational  herding  paradigm  as 
played  by  selfish  individuals.  Let  the  realized  payoff  sequence  be  (uk).  Reinterpret  £ A"s 
problem  as  that  of  an  informationally  constrained  social  planner  SV  trying  to  maximize 
the  expected  discounted  average  welfare  E[{\  —  8)  Y^Li  ^n_1^nM  of  the  individuals  in  the 
herding  model.  Observe  how  SV's  and  <f  A"s  objectives  are  perfectly  aligned.  To  respect 
that  key  herding  informational  assumption  that  actions  but  not  signals  are  observed,  we 
further  assume  that  the  SV  neither  knows  the  state  nor  can  observe  the  individuals'  private 
signals,  but  can  both  observe  and  tax  or  subsidize  any  actions  taken. 

How  does  SV  steer  the  choices  away  from  the  myopic  solution  to  £A"s  problem?  Given 
the  current  public  belief  ir,  if  an  individual  takes  action  a  G  .4,  he  then  receives  the 
(possibly  negative)  transfer  r(o|7r).  A  constrained  herding  equilibrium  (CHE)  is  a  Bayes- 
Nash  equilibrium  of  the  repeated  game  where  every  T>M.  n  =  1,2, ...  seeks  to  maximize 
his  expected  one-shot  myopic  payoff  u(a,7r)  plus  incurred  transfers  r{a\n).  Faced  with 
such  incentives,  our  proof  in  Lemma  2  that  individuals  optimally  choose  private  belief 
threshold  rules  is  still  valid,  for  any  transfers. 

Since  the  SVs  policy  is  measurable  in  the  same  observed  action  history  as  was  £X's 
program,  the  best  SV  can  do  is  to  coax  each  VAi  to  implement  SX's  optimal  rule  x*. 
A  constrained- efficient  herding  equilibrium  (CEHE)  is  a  CHE  where  the  transfers  achieve 
this  constrained  first  best  outcome.  Existence  follows  at  once  from  Lemma  1. 

Lemma  6     For  any  discount  factor  S  <  1,  the  optimal  policy  £5  for  £X  is  a  CEHE. 

Since  the  private  belief  a  maps  into  the  posterior  p(ir,  a)  =  ■Ka/[ira-\-  (1  —  7r)(l  —  a)],  the 
selfish  herder's  threshold  9m  must  solve  the  indifference  equation  um(p(7r,  9m))  +  r(am|7r)  = 
um+i(p(iT,  9m))  +  r(am+i|7r).  So  the  transfer  difference  r(am_1|7r)  —  r(am|7r)  alone  matters 
for  how  individuals  trade-off  between  the  two  actions,  and  the  SV  can  ensure  that  the 
threshold  belief  is  optimally  chosen  (9m  =  9^)  by  suitably  adjusting  this  net  premium  for 
taking  action  am_i  rather  than  am. 

We  want  to  provide  some  characterization  of  these  transfers.  Clearly,  SV  will  not  tax 
or  subsidize  actions  if  her  desired  one  will  be  chosen  anyway,  i.e.  for  -k  £  Jm(5)  C  Jm. 
Conversely,  when  ir  £  Jm{o~),  SV  perforce  wishes  to  encourage  nonmyopic  actions,  and 
some  transfers  intuitively  will  differ  from  zero.  Indeed,  ix  E  Jm{o~)  C  Jm  is  a  strict  inclusion 
for  all  5  >  0  by  Proposition  5,  and  thus  transfers  are  not  identically  zero  for  a  patient  SV. 

Our  action  indices  shed  some  more  light  on  the  optimal  transfers,  beyond  the  mere  fact 
that  'experimentation'  (making  non-myopic  choices)  is  rewarded.  Clearly,  the  sum  of  his 
transfer  and  myopic  payoffs  in  a  CEHE  must  leave  every  individual  who  should  be  on  the 
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knife-edge  between  two  neighbouring  optimal  actions  perfectly  indifferent.  In  other  words, 
we  need  T(am\ir)  +  um(p(iT,6m))  =  w^ir ', 9m) / (1  —  8).  In  fact,  this  condition  is  sufficient 
too,  because  the  interval  structure  is  optimal  by  Lemmata  2  and  6,  and  myopic  incentives 
alone  will  lead  inframarginal  £>.M's  to  make  the  correct  decisions. 

Proposition  6  (Optimal  Transfers)  For  any  belief  it,  there  exist  Xi  <  ■  ■  •  <  XM,  with 
Xm  £  dv(q(ir,£s(Tr),am),6),  so  that  the  following  are  efficient  transfers  r(am|7r): 

T(am\7r)  =  Sv(q(7T,  (,5(ir),  am))  +  8Xm[p(ir,  6m)  -  q(ir,  £6{tt),  am)]/(l  -  8) 
=  wsm(ir,  9m)/{l  -8)-  um(p(ir,  9m)) 

Observe  that  incentives  are  unchanged  if  a  constant  is  added  to  all  M  transfers.  Con- 
sequently, SV  may  also  achieve  expected  budget  balance  each  period:  i.e.  the  expected 
contribution  from  everyone  is  zero,  or  0  =  Ylm=i  'tP(am\'^,C  (7r))-r(am|7r)-  There  is  obvi- 
ously a  unique  set  of  efficient  transfers  that  satisfies  budget  balance. 

5.1   Herding  is  Constrained  Efficient 

We  are  now  positioned  to  reformulate  the  learning  results  of  the  last  section  at  the  level 
of  actions.  First  a  clarifying  definition:  We  say  that  a  herd  arises  on  action  am  at  stage 
N  if  all  individuals  n  =  N,  N  +  1,  N  +  2, . . .  choose  action  am.  Observe  that  this  differs 
from  the  definition  of  a  cascade;  certainly,  a  cascade  will  imply  a  herd,  but  the  converse 
is  false.  To  show  that  herds  arise,  we  can  generalize  the  Overturning  Principle  of  SS  to 
this  case:  Claim  4  (statement  and  proof  appendicized)  establishes  that  for  it  near  Jm(5), 
actions  other  than  am  will  push  the  updated  public  belief  far  from  its  current  value.  Thus, 
convergence  of  beliefs  implies  convergence  of  actions  —  or,  a  limit  cascade  implies  a  herd. 
The  following  is  thus  a  corollary  to  Proposition  2. 

Proposition  7  (Convergence  of  Actions)     In  any  CEHE  for  discount  factor  8: 

(a)  An  ex  post  optimal  herd  eventually  starts  for  8  €  [0, 1)  and  unbounded  private  beliefs. 

(b)  With  bounded  private  beliefs,  a  herd  on  an  action  eventually  starts.  Unless  ttq  e  Jm{°~)> 
a  herd  arises  on  an  action  other  than  om  with  positive  chance  in  state  H  for  any  8  €  [0, 1). 

(c)  The  chance  of  an  incorrect  herd  with  bounded  private  beliefs  vanishes  as  5  "f  1. 

It  is  no  surprise  that  SV  ends  up  with  full  learning  with  unbounded  beliefs,  for  even  selfish 
individuals  will.  More  interesting  is  that  SV  optimally  incurs  the  risk  of  an  ever-lasting 
incorrect  herd.  Herding  is  truly  a  robust  property  of  the  observational  learning  paradigm. 
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6.   CONCLUSION 

This  paper  has  shown  that  informational  herding  in  models  of  observational  learning  is 
not  such  an  adverse  phenomenon  after  all:  Rather,  it  is  a  constrained  efficient  outcome  of 
the  social  planner's  problem,  and  is  robust  to  changing  the  planner's  discount  factor.  To 
understand  this  decentralized  outcome,  we  have  then  derived  an  expression  for  the  social 
present  value  of  each  action.  This  formulation  differs  from  the  Gittins  index  because  of  the 
agency  problem:  Since  private  signals  are  privately  observed,  aligning  private  and  social 
incentives  entails  a  translation  using  the  marginal  social  value.  Finally,  we  have  used  this 
expression  to  prove  a  strict  comparative  static  that  eludes  dynamic  programming  methods: 
Namely,  cascade  sets  strictly  shink  as  the  planner  grows  more  patient. 

This  paper  has  also  discovered  and  explored  the  fact  that  informational  herding  is 
simply  incomplete  learning  by  a  single  experimenter,  suitably  concealed.  Our  mapping, 
recasting  everything  in  rule  space,  has  led  us  to  an  equivalent  social  planner's  problem. 
While  the  revelation  principle  in  mechanism  design  also  uses  such  a  'rule  machine',  the 
exercise  is  harder  for  multi-period,  multi-person  models  with  uncertainty,  since  the  planner 
must  respect  the  agents'  belief  filtrations.  While  this  is  trivially  achieved  in  rational 
expectation  price  settings,  one  must  exploit  the  martingale  property  of  public  beliefs  with 
observational  learning,  and  largely  invert  the  model.  This  also  works  for  more  general  social 
learning  models  without  action  observation  —  provided  an  entire  history  of  posterior  belief 
signals  is  observed.  Absent  this  assumption,  the  public  belief  process  (howsoever  defined) 
ceases  to  be  a  martingale,  and  expression  as  an  experimentation  model  with  perfect  recall 
is  no  longer  possible.  This  explains  why  our  model  of  social  learning  with  random  sampling 
Smith  and  Sorensen  (1997b)  must  employ  entirely  different  techniques  (Polya  urns). 

Of  course,  once  informational  herding  is  correctly  understood  as  single-person  Bayesian 
experimentation,  it  no  longer  seems  so  implausible  that  incorrect  herds  may  be  constrained 
efficient.  For  incomplete  learning  is  if  anything  the  hallmark  of  optimal  experimentation 
models,  even  with  forward-looking  behaviour.  This  link  also  offers  hope  for  reverse  in- 
sights into  the  experimentation  literature:  As  in  EK,  incomplete  learning  at  the  very  least 
requires  an  optimal  action  x  for  which  unfocused  beliefs  are  invariant,  i.e.  the  distribution 
ip(a\to,  x)  of  signals  a  is  the  same  for  all  states  lu.  For  such  an  invariance  is  clearly  easier  to 
satisfy  with  fewer  available  signals,  and  not  surprisingly  herding  and  all  published  failures 
of  complete  learning  that  we  have  seen  assume  a  finite  (vs.  continuous)  signal  space.  For 
instance,  Rothschild  (1974),  McLennan  (1984),  and  ABHJ's  example  are  all  binary  signal 
models.  More  precisely,  the  unbounded  beliefs  assumption  in  an  experimentation  context 
corresponds  to  an  ability  to  run  experiments  with  an  arbitrarily  small  marginal  cost  (eg. 
shifting  the  threshold  9k  slightly  up  only  incurs  myopic  costs  o(d6k))- 

16 


A.   APPENDIX 

Let  the  Bellman  operator  Tg  be  given  by  T§v  equals  the  RHS  of  (1).  Note  that  for 
v  >  v'  we  have  TgV  >  Tsv'.  As  is  standard,  T$  is  a  contraction,  and  v(-,5)  is  its  unique 
fixed  point  in  the  space  of  bounded,  continuous,  weakly  convex  functions. 

A.l   Proof  of  Proposition  1 

The  proposition  is  established  in  a  series  of  steps.  First,  define  the  myopic  expected 
utility  frontier  function  v0  by  v0(ir)  =  maxmitm(7r).11 

Step  1  (Interval  Cascade  Sets)  For  each  am  G  A,  a  possibly  empty  interval  Jm(^) 
exists,  such  that  when  n  G  Jm{o~),  SV  optimally  chooses  x  with  supp(//)  C  [$m-i,9m],  i.e. 
am  occurs  a.s.  (learning  stops).  For  any  S  G  [0, 1),  0  G  Ji(S)  and  1  G  Jm{°~)- 
Proof:  For  the  first  half,  we  really  need  only  prove  that  Jm(S)  must  be  an  interval.  If 
iv  G  </m((5),  then  am  is  the  optimal  choice,  and  the  value  is  v(ir,5)  =  Vq(tc)  =  um(Tr). 
Conversely,  if  v(tt,8)  =  vq(tt)  —  um(7r)  then  ?r  G  Jm(S)  and  am  is  the  optimal  choice.  As 
um(ir)  is  an  affine  function  of  n,  and  v (-,6)  is  weakly  convex,  Jm(S)  must  be  an  interval. 

For  the  second  half,  a\  is  myopically  strictly  optimal  for  the  focused  belief  7rn  =  0,  and 
since  it  updates  to  irn+\  =  ir  a.s.  no  matter  which  rule  is  applied,  it  is  also  dynamically 
optimal  for  any  discount  factor  S  G  [0, 1).  A  similar  argument  holds  when  irn  =  1.  □ 

Step  2  (Iterates  and  Limit)  The  sequence  {T^v0}  consists  of  weakly  convex  functions 
that  are  pointwise  increasing,  and  converge  to  v(-,5).  The  value  v(-,S)  weakly  exceeds  vq, 
and  strictly  so  outside  the  cascade  sets:  v(n,5)  >  v0(ir)  V<5  G  [0, 1)  andVn  0  \J^=lJm(5) . 

Proof:  To  maximize  J^ameA^i0'^,  x)  [(1  -  o~)um(q{n,  x,  am))  +  8v0(q(7r,x,  am))]  over  x 
for  given  7r,  one  rule  x  almost  surely  chooses  the  myopically  optimal  action.  Then 
q(n ,  x ,  x(o))  —  n  a.s.,  resulting  in  value  vQ(7r).  Optimizing  over  all  x  G  X,  T$vq(ix)  >  v0(ir) 
for  all  7T.  By  induction,  T^v0  >  T$~1vo,  yielding  (as  usual)  a  pointwise  increasing  sequence 
converging  to  the  fixed  point  v(-,5)  >  v0.  Finally,  when  ir  is  outside  the  cascade  sets,  by 
definition  it  is  not  optimal  to  almost  surely  induce  one  action.  So,  v(tv,S)  >  v0(^).         □ 

The  following  either  is  or  ought  to  be  a  folk  result  for  optimal  experimentation,  but  we 
have  not  found  a  published  or  cited  proof  of  it.12  At  any  rate,  it  is  here  for  completeness. 

Step  3  (Weak  Value  Monotonicity)  The  value  function  is  weakly  increasing  in  6: 
Namely,  for  5i  >  52,  v(n,  5X)  >  v(n,  52)  for  all  it. 


nObserve  how  this  differs  from  v(n,0)  =  supx  J^m  ip(am\-K,x)um(q(n,x,am)).  In  other  words,  v(n,Q) 
allows  the  myopic  individual  to  observe  one  signal  a  before  obtaining  the  ex  post  value  vQ(p(ir,o)). 
12But  ABHJ  do  assert  without  proof  (p.  625)  that  the  patient  value  function  exceeds  the  myopic  one. 
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Proof:  Clearly,  J2ameA  ip(am\^,x)um(q(TV,  x,  am))  <  Y,arneA  i>(am\^,x)v(q(7T,x,  am))  for 
any  x  and  any  function  v  >  Vq.  If  5i  >  62,  then  T^Vq  >  Ts2vq,  since  more  weight  is  placed 
on  the  larger  component  of  the  RHS  of  (1).  Because  one  possible  policy  under  6\  is  to 
choose  the  £  optimal  under  52,  we  have  T^v0  >  TJ^vq.  Let  n  — >  00  and  apply  step  2.       □ 

Step  4  (Weak  Inclusion)  All  cascade  sets  weakly  shrink  when  6  increases:  In  other 
words,  Vam  G  A,  if  1  >  81  >  52  >  0,  then  Jm(5i)  C  Jm(52). 

Proof:  As  seen  in  steps  3  and  2,  v(tt,Si)  >  u(7r,52)  >  vo(tt)  >  um(7r)  for  all  n,  when 
Si  >  62-  For  7T  G  Jm{3\)i  we  know  v(7r, 5i)  =  um(7r)  and  thus  v(7r,52)  =  um{Ti).  The 
optimal  value  can  thus  be  obtained  by  inducing  am  a.s.,  so  that  tt  G  Jmi^)-  D 

Step  5  (Unbounded  Beliefs)  VFrf/i  unbounded  private  beliefs,  only  cascade  sets  for  the 
extreme  actions  are  empty,  with  Ji(5)  =  {0}  and  Jm(5)  =  {1}/  all  other  Jm(S)  are  empty. 

Proof:  SS  establish  for  the  myopic  model  that  all  Jm(0)  are  empty,  except  for  Ji(0)  =  {0} 
and  Ja/(0)  =  {1}.  Now  apply  steps  1  and  4.  □ 

Step  6  (Bounded  Beliefs)  If  the  private  beliefs  are  bounded,  then  Ji(5)  —  [0,7r(£)]  and 
JM(6)  =  [n(S),  I],  where  0  <  ir_(5)  <  it(5)  <  1. 

Proof:  We  prove  that  for  sufficiently  low  beliefs  it  is  optimal  to  choose  a  rule  x  that 
almost  surely  induces  a\\  the  argument  for  large  beliefs  is  very  similar.  Since  action  a\ 
is  optimal  at  belief  -k  =  0,  and  is  not  weakly  dominated,  it  must  be  the  optimal  choice 
for  beliefs  it  <  tt,  for  some  fr  >  0.  Thus,  Ui(n)  =  Vo(%)  on  [0,7r].  Since  each  um  is  afhne, 
Ui(ir)  >  um(ir)  +  ufor  all  m  ^  1  for  some  u  >  0,  and  for  all  beliefs  -k  in  the  interval  [0,  fr/2]. 
No  observation  a  6  A  can  produce  a  stronger  signal  than  any  a  £  supp(^)  C  [a,  a]  C 
(0, 1).  So  any  initial  belief  tt  is  updated  to  at  most  q(7i)  =  no j\nd +  (1  —  7r)(l  —  a)].  For  n 
small  enough,  q(7r)  G  [0, 7r/2]  and  q(ir)  —  n  is  arbitrarily  small,  and  so  is  v((j(tt),  S)  —  v(n,  6) 
small  by  continuity  of  v  —  in  particular,  less  than  u(l  —  5)/5  for  small  enough  tv.  By  the 
Bellman  equation  (1),  any  action  a  ^  ai  is  strictly  suboptimal  for  such  small  beliefs.      □ 

Step  7  (Limiting  Patience)  For  large  enough  5,  all  cascade  sets  disappear  except  for 
Ji(S)  and  Jm{o~),  while  lim^j  Ji(6)  =  {0}  and  lim^i  Jm{5)  =  {1}- 

Proof:  Fix  5  G  [0,1),  and  an  action  index  m  (1  <  m  <  M)  for  which  Jm(S)  =  [7Ti,7T2], 
for  some  0  <  ii\  <  7r2  <  1.  Since  there  are  informative  private  beliefs,  30*  G  (1/2, 1)  with 
1  >  fiH([8*,l})  >  fiL([9*,l})  >  0.  We  shall  consider  the  alternative  rule  x,  with  interval 
boundaries  6m_i  =  0,  6m  =  9*,  #m+1  =  1  (see  Lemma  2). 

Updating  the  prior  tt  with  the  event  {a  G  [9*,  1]}  results  in  the  posterior  belief  q(ir)  — 
tt/j,h([9*,1])/[7tiih([9*,1})  +  (1  -  ir)tiL([8*,l})}  in  state  H.    For  any  compact  subinterval 
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/  C  (0, 1),  in  particular  one  with  I  D  Jm{8),  there  exists  e  =  e(I)  >  0  with  q(ir)  —  tx  >  e 
for  all  n  E  I.  By  definition  of  e,  q  maps  the  interval  [7r2  —  f/2,7r2]  into  (but  not  necessarily 
onto)  [tt2  +  e/2, 1].  Choose  u  >  0  so  large  that  um(-7r)  <  um+1(7r)  +  u  for  all  7r  £  [0, 1]. 
Since  v(7r,  8)  >  um(7r)  outside  Jm(S)  =  [7Ti,7r2],  and  both  are  continuous  in  n,  we  may  also 
choose  u  >  0  so  small  that  v(tt,  8)  >  um(7r)  +  u  for  all  i\  £  [n2  +  s/2, 1].  By  step  3,  we  thus 
have  v(tt,  8')  >  um(7r)  +  u  for  all  8'  >  8.  If  8'  >  8  is  so  large  that  (1  —  8')u  <  5'u,  then 
the  Bellman  equation  (1)  reveals  that  our  suggested  rule  x  beats  inducing  am  a.s.  when 
7r  G  [ii2  —  £/2, 712].  By  iterating  this  procedure  a  finite  number  of  times,  each  time  excising 
length  e/2  from  interval  Jm(8),  we  see  that  Jm(8)  evaporates  for  large  enough  5. 

If  rn  =  1  or  M,  apply  this  procedure  repeatedly:  Jm{8)  n  /  vanishes  for  5  near  1.       □ 

A. 2  Proof  of  Lemma  4 

We  first  consider  a  stronger  version  of  step  3.  Call  the  private  signal  distribution  TS 
(Two  Signals)  if  its  support  contains  only  two  isolated  points  (as  is  the  case  in  BHW). 

Claim  1  (Unreachable  Cascade  Sets)  Fix  5  >  0.  If  TS  fails,  then  for  any  it  not  in 
any  8-cascade  set  ("Ar).'  an  action  am  is  taken  with  positive  chance  inducing  a  posterior 
belief  q(7r,  x,  am)  not  in  any  8-cascade  set.  If  TS  holds,  then  (^k)  obtains  for  all  non-cascade 
beliefs  i\  except  possibly  at  most  M  —  1  points,  each  the  unique  belief  between  any  pair  of 
nonempty  cascade  sets  Jm_i(0)  and  Jm(0)  from  which  both  cascade  sets  can  be  reached. 

Proof:  At  a  non-cascade  belief  7r,  at  least  two  actions  are  taken  with  positive  chance,  and 
by  the  interval  structure,  some  action  shifts  the  public  belief  upwards  while  another  shifts 
it  downwards.  With  unbounded  beliefs,  g("7r,  x,  am)  never  lies  in  a  cascade  set;  therefore,  as- 
sume bounded  beliefs.  Let  co(supp(F))  =  [b,  b}.  Assume  that  it  lies  between  the  nonempty 
cascade  sets  Jm'(0)  and  Jm(0),  with  m'  <  m,  and  let  7r  =  sup  Jm'(0)  and  7f  =  inf  Jm(0). 
By  definition  of  these  cascade  sets,  p(7r,  b)  <  p(7t,b).  If  all  possible  actions  at  7r  led  into  a 
cascade  set,  then  p(n,b)  <  %_  and  p(vr,6)  >  f.  But  these  inequalities  can  only  hold  with 
equality: 

p(p(ir,b),b)  >  p(tt,6)  >  p{ji,b)  >  p{p{n,b),b)  =  p(p(n,b),b) 

and  because  the  outer  terms  coincide,  as  Bayes-updating  commutes.  So,  between  Jm'(0) 
and  Jm(0)  there  exists  at  most  one  point  ft  which  can  satisfy  both  equations;  moreover, 
such  a  point  exists  iff  m!  =  m  —  1  and  TS  holds.  Indeed,  given  TS,  we  may  simply  choose 
■k  to  solve  p{fc,b)  =  7f,  while  if  TS  fails,  then  with  positive  chance,  a  nonextreme  signal  is 
realized,  and  the  posterior  q  is  not  in  a  cascade  set.  With  8  >  0  we  have  weakly  smaller 
cascade  sets  by  Step  4  of  the  Proposition  1  proof,  so  a  fr  failing  (it)  is  even  less  likely  to  exist 
-  in  fact  it  would  further  require  sup  Jm^\{8)  =  sup  Jm_i(0)  and  inf  Jm(8)  =  inf  Jm(0). 
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Finally,  assume  TS.  Consider  any  ff  with  reachable  cascade  sets  Jm-i(8)  and  Jm(5). 
Then  the  rule  x  mapping  b  into  am_i  (low  signal  to  tt_)  and  b  into  am  (high  signal  to  ff)  is 
indeed  optimal.  By  convexity,  v(w,  8)  is  at  most  the  average  of  v(it,  8)  and  v(n,  8)  (weights 
given  by  transition  chances),  and  x  achieves  this  average.  So  v(-,8)  is  affine  on  (7f,7r).    D 

We  now  finish  proving  Lemma  4.  From  step  2  of  Proposition  l's  proof,  v(tt,  8{)  >  v0(ir) 
for  it  outside  the  (^-cascade  sets.  Fix  it  outside  the  <52-cascade  sets.  If  tt  lies  in  a  ^-cascade 
set  we're  done,  as  v(it,8i)  =  vq(7t)  <  v(tt,82).  Suppose  tt  lies  outside  the  5rcascade  sets. 

Assume  first  that  tt  satisfies  {■+*)  of  Claim  1  for  5\  (and  thus  also  for  82).  Then  at 
least  one  action  am  is  taken  with  positive  chance  inducing  a  belief  q(ir ,  £,Sl  (it)  ,  am)  not  in 
a  (^-cascade  set.  Thus,  v(q(/K,(iSl(TT),  am),  81)  >  v0(q(7r,£Sl(n),am)).  Since  82  >  51; 

wCtt,^)  =  (r4l«(-,tfi))(7r)  <  {TS2v(;ti))(K)  <  (Tft«(-,<y2))(7r)  -  z;(7r,52)  (4) 

Next  assume  that  some  7r  between  .7m_x((5i)  and  Jm(^i)  fails  (-fc)  for  ^.  If  (4)  holds 
at  7T,  we  are  done.  Assume  not.  Claim  1  noted  that  between  consecutive  cascade  sets 
such  7T  must  be  unique,  and  that  it  implied  TS.  In  that  case,  (4)  holds  in  a  punctured 
neighbourhood  (7r,  it)  U  (it,  7f)  of  71",  where  7r  =  sup  Jm-i(8i)  and  ff  =  inf  Jm(8i).  Also,  from 
the  last  paragraph  of  Claim  l's  proof,  v(-,8\)  was  everywhere  an  affine  function  on  [zr,  ff], 
which  in  turn,  is  a  supporting  tangent  line  to  the  convex  function  v(-,  82)  at  ff  (see  Step  3). 
As  it  touches  v(-,82)  at  tt  only,  v(tt_,  82)  >  v(n,8i)  and  ^(ff,^)  >  v(tt,8i). 

To  find  a  lower  bound  to  ^(ff,^),  apply  rule  x  from  Claim  l's  proof  at  the  belief  ff. 
Since  x  maps  b  into  tt_  €  Jm-i(<^i)  and  b  into  ff  G  Jm{8i),  it  yields  myopic  first-period  values 
Wm-i(zc)  =  v{k,8i)  and  um(rt)  —  v(tt,8i),  and  continuation  values  v(tt_,  82)  and  v(tt,82)- 
From  the  right  hand  side  of  (1),  this  mixture  is  worth  strictly  more  than  v(jt, Si): 

v(tt,8i)  =  ip(am-i\7T,x)v{TL,8i)  +  ip(am\Tt,x)v{7t,8i) 

<  i/j(am-i\n,x)  [(l-82)v(ff, 81)+82v(?l, 82)]  +  ^(am|ff,x)  [{l-82)v(7r,8i)+S2v(Tt,82)] 

which  is  clearly  at  most  v(tt,82).  Given  this  contradiction,  (4)  must  hold  at  ff.  □ 

A. 3   Differentiability 

Continuity.  In  light  of  the  interval  structure  of  Lemma  2,  SV  simply  must  determine 

the  chances  ip(am\ir)  with  which  to  choose  each  action  (i.e.,  what  fraction  of  the  signal 

space  maps  into  each  action).  Thus,  the  choice  set  is  WLOG  the  compact  M-simplex  A(>1) 

-  that  is,  the  same  strategy  space  as  in  our  earlier  general  observational  learning  model. 

Since  the  objective  function  in  the  Bellman  equation  (1)  is  continuous  in  this  choice  vector 
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and  in  tt,  it  follows  from  the  Theorem  of  the  Maximum  (e.g.  Theorem  I.B.3  of  Hildenbrand 
(1974))  that  the  non-empty  optimal  rule  correspondence  is  upper  hemi-continuous  in  tt. 

Proof  of  Lemma  5.  Assume  that  v  is  not  differentiable  at  it,  and  that  all  optimal 
rules  at  it  are  equivalent.  We  show  this  leads  to  a  contradiction.  In  other  words,  if  all 
optimal  rules  at  it  are  equivalent,  then  v  is  differentiable  at  tt,  as  asserted. 

Claim  2  \/e  >  0  38  >  0  such  that  when  \tt  —  tt\  <  8,  any  optimal  rule  at  tt  induces  action 
am  with  probability  at  least  1  —  e  in  both  states  H,  L. 

Proof:  Since  tt  G  Jm{$),  one  optimal  rule  at  tt  induces  am  with  probability  one.  This 
property  is  shared  by  all  rules  optimal  at  ir.  Next,  if  ip(am\Tr)  =  TTip(am\H)  +  (l  —  7r)-0(am|I/) 
is  near  1,  so  are  both  tp(am\H)  and  ip(am\L).  The  claim  then  follows  from  the  upper 
hemicontinuity  of  the  optimal  rule  correspondence.  D 

Claim  3  ViV  6  N  Ve  >  0  38  >  0  such  that  if  \tt  —  tt\  <  8  then  under  any  optimal  strategy 
started  from  tt,  action  am  is  taken  for  the  first  N  periods  with  probability  at  least  1  —  e  in 
both  states  H,  L. 

Proof:  Fix  rj  <  1/2.  By  Claim  2,  for  TTn  close  enough  to  tt,  action  am  occurs  with  chance 
at  least  1  —  rj  in  each  state  starting  from  7rn.  If  am  occurs,  then  the  posterior  7r„+1  satisfies 
|7rn+i  —  TTn\  <  4tt(1  —  tt)t],  by  Bayes  rule.  So  |7r„+1  —  ttu\  can  be  chosen  arbitrarily  small 
when  am  occurs,  provided  tth  is  close  enough  to  tt. 

Choose  the  initial  7r  so  close  to  tt  that  if  am  occurs  for  the  next  N  consecutive  periods, 
the  posterior  belief  stays  close  enough  to  tt  that  am  occurs  with  conditional  chance  at  least 
(1  —  e)1^  each  period.  In  particular,  we  proceed  as  follows.  Let  pi  ^  tt  be  so  close  to  tt 
that  pi(l  —  pi)  <  3n(l  —  7r)/2  and  at  any  tt  within  \p\  —  ft\  of  tt,  all  optimal  rules  take 
am  with  chance  at  least  (1  —  e)1///v  in  each  state.  Let  771  =  \pi  —  tt\  and  choose  P2  7^  tt 
within  77i/[87r(l  —  Tt)]  of  tt  and  so  close  to  it  that  am  occurs  with  chance  at  least  1  —  771  in 
each  state  from  any  tt  within  \p2  —  tt\  of  it.  Apply  this  construction  iteratively  to  choose 
772  =  \p2  —  Tt\  and  then  p3  likewise,  and  then  p4, . . . ,  p^.  If  the  initial  belief  tt  lies  within 
|Pat  —  tt\  of  tt,  then  it  stays  within  \pi  —  it\  o£  it  in  the  next  A^  periods  provided  am  occurs 
in  each  period.  □ 

We  employ  the  machinery  from  the  proof  of  Proposition  4.  Any  optimal  strategy 
started  at  belief  tt  will  yield  some  state-contingent  values  vL  and  vH .  The  affine  function 
h(p)  which  has  h(0)  =  vL  and  h(l)  —  vH  is  then  a  tangent  to  the  value  function  at  tt. 

Since  it  is  optimal  to  take  am  forever  at  tt,  one  tangent  to  v  at  tt  is  the  affine  function 
h  which  intersects  u(am,  L)  at  tt  —  0  and  u(am,  H)  at  tt  =  1.  Consider  the  left  and  right 
derivatives  of  v  at  ir,  with  corresponding  tangent  lines  hi(p)  and  h2(p)  at  belief  p.  One 
of  those  tangents  --  say,  hi  -  -  must  differ  from  h  (when  hi  differs,  necessarily  m  >  1). 
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Define  v\  =  h\(0)  >  h(0)  =  u(am,  L)  and  v^  =  h\{\)  <  h(l)  =  u(am,  H).  Since  u(ai,  L)  > 
nf  >  u(am,  L),  a  unique  A  >  0  exists  satisfying  v[  =  Au(ai,  L)  +  (1  —  A)u(am,  L). 

As  v  is  convex,  it  is  differentiable  almost  everywhere.  So  let  ivk  |  7r  be  a  sequence  of 
beliefs  converging  up  to  n,  with  the  value  function  differentiable  at  each  irk.  The  tangent 
function  is  then  uniquely  determined  for  each  irk,  and  its  intercepts  at  p  —  0, 1  are  the 
state-dependent  payoffs  of  any  optimal  strategy  started  at  nk,  namely  vL(irk)  >  v\  and 
vH{^k)  <  v\  ■  The  inequalities  of  course  follow  by  convexity  of  v  and  irk  <  ■n. 

Now  choose  N  so  large  and  e  so  small  that  A/2  >  1  —  (1  —  5/v)(l  -  e).  Note  that 
action  a,\  is  strictly  the  best  action  in  state  L.  Then  by  Claim  3,  for  all  large  enough  k, 
the  expected  value  vL(nk)  in  state  L  of  the  optimal  strategy  starting  at  n^  is  at  most 

vL(irk)  <  (1  -  6N)(1  -  e)u(amj  L)  +  [1  -  (1  -  <JJV)(1  -  e)]u(o1}L) 

<  (1  -  X/2)u(am,  L)  +  {X/2)u(ax,  L) 

<  (1  -  \)u(am,  L)  +  Au(oi,  L)=v}<  vL{nk) 

since  u(ai,L)  >  u(am,L),  as  noted  above.  Contradiction.  □ 

A. 4  Proof  of  Proposition  7 

Near  Jm(5)  we  should  expect  to  observe  action  am.  The  next  lemma  states  that  when 
other  actions  are  observed  they  lead  to  a  drastic  revision  of  beliefs,  or  there  was  a  non- 
negligible  probability  of  observing  some  other  action  which  would  overturn  the  beliefs. 

Claim  4  (Overturning  Principle)  For  any  S  £  [0, 1),  optimal  £*,  and  Jm{5)  /  0, 
there  exists  e  >  0  and  an  e -neighbourhood  K  D  Jm(5),  such  that  W  €  K  n  (0, 1),  either: 

(a)  i/>(am|7T,  £  (7r))  >  1  —  e,  and  \q(ir,  £,S(n),  ak)  —  7r |  > <s  /or  a//  a^  ^  am  that  occur;  or 

(b)  i>(am\ir,£s(%))  <  l-e,  and  for  some  a  G  A  :  ip(a\n,£5(ir))>e/M,  \q(ix  ,^5  {-k)  ,  o)-^k\>  e . 

Proof:  First,  assume  bounded  private  beliefs.  By  Step  6  of  the  proof  of  Proposition  1, 
for  7r  close  enough  to  0  or  1,  the  only  optimal  rule  is  to  stop  learning.  Thus,  we  need 
only  consider  it  in  some  closed  subinterval  I  of  (0, 1).  For  any  small  enough  n  >  0  and 
-k  sufficiently  close  to  Jm(5),  we  have  for  any  k  ^  m  :  ip(ak\iT,^s(Tr))  <  1  —  rj.  Otherwise, 
since  the  optimal  rule  correspondence  is  u.h.c,  almost  surely  taking  action  ak  is  optimal  at 
some  7r  G  Jm(5)  C  Jm-  This  is  impossible,  as  ak  incurs  a  strict  myopic  loss,  and  captures 
no  information  gain.  Let  co(supp(F))  =  [6,6].  By  the  existence  of  informative  beliefs, 
b  <  1/2  <  b.  Let  e  >  0  be  the  minimum  of//,  iiH{[b,  (26  +  l)/4])  and  //i([(26  +  l)/4,  6]). 

Assume  ip(am\ir,£s(Tt))  >  1  —  e  for  some  n  E  I.  Then  any  action  a^  ^  am  is  a.s.  only 
taken  for  beliefs  within  either  [6,  (26+l)/4]  or  [(26+l)/4,  6].  Any  such  ak  implies  the  stated 
overturning  (selecting,  if  necessary,  e  even  smaller).  If  instead  ?/>(am|7r,£5(7r))  <  1  —  e,  then 
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each  action  is  taken  with  chance  less  than  1  —  e,  and  so  different  actions  are  taken  at  the 
extreme  private  beliefs  (by  the  threshold  structure  of  the  optimal  rule).  At  least  one  of 
the  M  actions  then  occurs  with  chance  at  least  e/M  and  overturns  the  beliefs,  as  claimed. 
Next  consider  unbounded  private  beliefs.  Assume  that  tt  is  near  the  cascade  sets  {0}  or 
{1}  —  say  7r  near  0.  Let  the  optimal  policy  induce  with  positive  chance  an  action  ak  ^  a\ 
with  q(n,  £,6{tt),  dk)  near  n.  Consider  the  altered  policy  that  redirects  private  beliefs  from 
afc  into  «!  instead.  When  tt  and  9(^,^(71),  ak)  are  near  0,  this  yields  a  boundedly  positive 
first-period  payoff  gain  and  an  arbitrarily  small  loss  in  future  value  (for  v  is  continuous  in 
q,  which  shift  very  little,  as  ak  was  nearly  uninformative).  So  the  altered  policy  is  a  strict 
improvement:  contradiction.  Consequently,  any  action  ak  7^  a\  taken  with  positive  chance 
has  |g(7r,  £5(7r),  ak)  —  tt\  >  e  for  some  e  >  0.  □ 

For  the  proof  of  Proposition  7,  we  first  cite  the  extended  (conditional)  Second  Borel- 

Cantelli  Lemma  in  Corollary  5.29  of  Breiman  (1968):    Let  Yi,Y2,...  be  any  stochastic 

process,  and  An  e  T(Y\, . . . ,  Yn),  the  induced  sigma-field.  Then  almost  surely 

00 
{uj\uj  £  An  infinitely  often  (i.o.)}  =  {u>\  \^  P{An+\\Yn, . . . ,  Yx)  =  00} 

1 

Fix  an  optimal  policy  £5 .  Choose  e  >  0  to  satisfy  Claim  4  for  all  actions  Oi,  a2, . . . ,  a^- 
For  fixed  m,  define  events  En  =  {ixn  is  e-close  to  Jm(S)},  Fn  =  {ip(am\irn,  £6 (irn))  <  1  —  e}, 
and  Gn+i  —  {|7rn+1  —  7rn|  >  e}.  If  En  n  Fn  is  true,  then  Claim  4  scenario  (b)  must  obtain, 
and  therefore  P(G„+i|7rn)  >  e/M.  So  J2™=i  P(Gn+i\^i,  ■  ■  ■  ,ifn)  =  00  on  the  event  where 
En  n  Fn  occurs  i.o.  By  the  above  Borel-Cantelli  Lemma,  Gn  obtains  i.o.  on  that  event 
almost  surely.  But  since  (7r„-)  almost  surely  converges  by  Lemma  3,  Gn  occurs  i.o.  with 
probability  zero.  By  implication,  En  D  Fn  occurs  i.o.  with  probability  zero. 

Restrict  attention  to  the  event  H  that  (nn)  converges  to  a  limit  in  Jm(5)  and  En  fl  Fn 
occurs  only  finitely  many  times.  Then  En  fl  Gcn+l  is  eventually  true  on  H,  and  thus  so  is 
EnnF£.  But  given  EnC)F£,  all  actions  ak  /  am  imply  Gn+i,  by  the  first  point  in  Claim  4. 
Perforce,  action  am  is  eventually  taken  on  event  En  fl  F£  n  Gcn+X.  Finally,  sum  over  all  m 
to  get  an  event  of  probability  mass  one,  by  Lemma  3  and  Proposition  1.  □ 
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